Shenyang
- North America > United States (0.29)
- North America > Canada (0.16)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- Asia > China > Liaoning Province > Shenyang (0.04)
- Transportation > Infrastructure & Services (0.31)
- Transportation > Ground > Road (0.31)
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)
- Asia > India > NCT > Delhi (0.04)
- (10 more...)
- Research Report (0.46)
- Overview (0.46)
- Health & Medicine (1.00)
- Information Technology (0.67)
- Law > Environmental Law (0.46)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Asia > China > Liaoning Province > Shenyang (0.04)
- North America > United States > California (0.14)
- Asia > China > Liaoning Province > Shenyang (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Workflow (0.46)
- Research Report (0.46)
- Information Technology (0.68)
- Government (0.46)
TGB 2.0: A Benchmark for Learning on Temporal Knowledge Graphs and Heterogeneous Graphs
Multi-relational temporal graphs are powerful tools for modeling real-world data, capturing the evolving and interconnected nature of entities over time. Recently, many novel models are proposed for ML on such graphs intensifying the need for robust evaluation and standardized benchmark datasets. However, the availability of such resources remains scarce and evaluation faces added complexity due to reproducibility issues in experimental protocols. To address these challenges, we introduce Temporal Graph Benchmark 2.0 (TGB 2.0), a novel benchmarking framework tailored for evaluating methods for predicting future links on Temporal Knowledge Graphs and Temporal Heterogeneous Graphs with a focus on large-scale datasets, extending the Temporal Graph Benchmark.
Towards Better Evaluation for Dynamic Link Prediction
Despite the prevalence of recent success in learning from static graphs, learning from time-evolving graphs remains an open challenge. In this work, we design new, more stringent evaluation procedures for link prediction specific to dynamic graphs, which reflect real-world considerations, to better compare the strengths and weaknesses of methods. First, we create two visualization techniques to understand the reoccurring patterns of edges over time and show that many edges reoccur at later time steps. Based on this observation, we propose a pure memorization-based baseline called EdgeBank. EdgeBank achieves surprisingly strong performance across multiple settings which highlights that the negative edges used in the current evaluation are easy. To sample more challenging negative edges, we introduce two novel negative sampling strategies that improve robustness and better match real-world applications. Lastly, we introduce six new dynamic graph datasets from a diverse set of domains missing from current benchmarks, providing new challenges and opportunities for future research. Our code repository is accessible at https://github.com/fpour/DGB.git.
Temporal Graph Benchmark for Machine Learning on Temporal Graphs
We present the Temporal Graph Benchmark (TGB), a collection of challenging and diverse benchmark datasets for realistic, reproducible, and robust evaluation of machine learning models on temporal graphs. TGB datasets are of large scale, spanning years in duration, incorporate both node and edge-level prediction tasks and cover a diverse set of domains including social, trade, transaction, and transportation networks. For both tasks, we design evaluation protocols based on realistic use-cases. We extensively benchmark each dataset and find that the performance of common models can vary drastically across datasets. In addition, on dynamic node property prediction tasks, we show that simple methods often achieve superior performance compared to existing temporal graph models. We believe that these findings open up opportunities for future research on temporal graphs. Finally, TGB provides an automated machine learning pipeline for reproducible and accessible temporal graph research, including data loading, experiment setup and performance evaluation. TGB will be maintained and updated on a regular basis and welcomes community feedback. TGB datasets, data loaders, example codes, evaluation setup, and leaderboards are publicly available at https://tgb.complexdatalab.com/.
- Asia > Middle East > Jordan (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Virginia (0.04)
- (17 more...)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- (8 more...)
RoFt-Mol: Benchmarking Robust Fine-Tuning with Molecular Graph Foundation Models
Liu, Shikun, Zou, Deyu, Shoghi, Nima, Fung, Victor, Liu, Kai, Li, Pan
In the era of foundation models, fine-tuning pre-trained models for specific downstream tasks has become crucial. This drives the need for robust fine-tuning methods to address challenges such as model overfitting and sparse labeling. Molecular graph foundation models (MGFMs) face unique difficulties that complicate fine-tuning. These models are limited by smaller pre-training datasets and more severe data scarcity for downstream tasks, both of which require enhanced model generalization. Moreover, MGFMs must accommodate diverse objectives, including both regression and classification tasks. To better understand and improve fine-tuning techniques under these conditions, we classify eight fine-tuning methods into three mechanisms: weight-based, representation-based, and partial fine-tuning. We benchmark these methods on downstream regression and classification tasks across supervised and self-supervised pre-trained models in diverse labeling settings. This extensive evaluation provides valuable insights and informs the design of a refined robust fine-tuning method, ROFT-MOL. This approach combines the strengths of simple post-hoc weight interpolation with more complex weight ensemble fine-tuning methods, delivering improved performance across both task types while maintaining the ease of use inherent in post-hoc weight interpolation.
- North America > United States (0.13)
- Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
- Asia > China > Liaoning Province > Shenyang (0.04)
- Asia > China > Hong Kong (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.92)
Smaller Models, Smarter Rewards: A Two-Sided Approach to Process and Outcome Rewards
Groeneveld, Jan Niklas, Qin, Xi, Schaefer, Alexander, Oren, Yaad
Generating high-quality code remains a challenge for Large Language Models (LLMs). For the evolution of reasoning models on this task, reward models are a necessary intermediate step. These models judge outcomes or intermediate steps. Decoder-only transformer models can be turned into reward models by introducing a regression layer and supervised fine-tuning. While it is known that reflection capabilities generally increase with the size of a model, we want to investigate whether state-of-the-art small language models like the Phi-4 family can be turned into usable reward models blending the consideration of process rewards and outcome rewards. Targeting this goal, we construct a dataset of code samples with correctness labels derived from the APPS coding challenge benchmark. We then train a value-head model to estimate the success probability of intermediate outputs. Our evaluation shows that small LLMs are capable of serving as effective reward models or code evaluation critics, successfully identifying correct solutions among multiple candidates. Using this critic, we achieve over a 20% improvement in the search capability of the most accurate code out of multiple generations.
- North America > United States > California > Orange County > Irvine (0.14)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > China > Liaoning Province > Shenyang (0.04)